# Skybridge-3D-CMOS: A Fine-Grained 3D CMOS Integrated Circuit Technology

Mingyu Li, Student Member, IEEE, Jiajun Shi, Student Member, IEEE, Mostafizur Rahman, Member, IEEE, Santosh Khasanvis, Member, IEEE, Sachin Bhat, Student Member, IEEE, and Csaba Andras Moritz, Senior Member, IEEE

Abstract—Parallel and monolithic three-dimensional (3-D) integration directions realize 3-D integrated circuits (ICs) by utilizing layer-by-layer implementations, with each functional layer being composed in 2-D. In contrast, vertically composed 3-D CMOS has eluded us likely due to the seemingly insurmountable requirement of highly customized complex routing and regional 3-D doping to form and connect CMOS pull-up and pull-down networks in 3-D. In the current layer-by-layer directions, routing can be worse than 2D CMOS because of the limited pin access. In this paper, we propose Skybridge-3D-CMOS (S3DC), an IC fabric that shows for the first time a pathway to achieve fine-grained static CMOS circuit implementations using the vertical direction while also solving 3-D routability. It employs a new fabric assembly scheme based on predoped vertical nanowire bundles. It implements circuits in and across nanowires. It utilizes unique connectivity features to achieve CMOS connectivity in 3-D with excellent routability. As compared to the usually severely congested monolithic 3-D implementations, S3DC eliminates the routing congestions in all benchmarks studied. Further results, for the implemented benchmarks, show 56–77% reductions in power consumption, 4X–90X increases in density, and 20% loss to 9% benefit in best operating frequencies compared with the transistor-level monolithic 3-D technology.

*Index Terms*—3D connectivity, 3D designs, fine-grained 3D integration, routability, Skybridge-3D-CMOS.

#### I. INTRODUCTION

HREE-DIMENSIONAL integration is an emerging technology direction to enable surpassing many of the current limitations in traditional CMOS scaling, including interconnection bottlenecks. However, it is considered impractical to build fine-grained static CMOS circuits using vertically-composed approaches directly. One major reason is that such technologies would require regional 3D doping to form and connect CMOS

Manuscript received December 27, 2016; accepted March 14, 2017. Date of publication May 2, 2017; date of current version July 7, 2017. This work was supported in part by the National Science Foundation under Grant 1407906, and in part by the Center for Hierarchical Manufacturing (CHM, NSF DMI-0531171) at UMass Amherst. The review of this paper was arranged by Associate Editor P.-E. Gaillardon. (Corresponding author: Mingyu Li).

- M. Li, J. Shi, S. Bhat, and C. A. Moritz are with the University of Massachusetts, Amherst, MA 01002 USA (e-mail: mingyul@umass.edu; jiajun@umass.edu; sachinbalach@umass.edu; andras@ecs.umass.edu).
- M. Rahman is with the University of Missouri, Kansas City, MO 64110 USA (e-mail: rahmanmo@umkc.edu).
- S. Khasanvis is with the BlueRiSC, Inc., Amherst, MA 01002 USA (e-mail: santosh@bluerisc.com)
- Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNANO.2017.2700626

pull-up and pull-down networks in 3D as well as incorporate associated routing. Because of these seemingly infeasible requirements, the main research focuses to date have been on incremental technology changes based on 2D CMOS. These include parallel integration with Through-Silicon-Vias (TSVs) [1]-[3] and monolithic integration in gate-level (G-MI) and in transistor-level (T-MI) granularity [4]–[7]. They are based on die-to-die and layer-to-layer stacking. These 3D technologies cause congestions by significantly reducing routability vs 2D CMOS [8]. Other recent 3D IC directions include a dynamicstyle Skybridge [9]-[12]. This fabric is based on a mindset that vertically composed fine-grained static CMOS is seemingly challenging to realize. Therefore, it chooses to utilize a dynamic circuit style that eliminates the complex routing and doping requirements entirely. However, it leads to circuit designs that are not compatible with static CMOS and is a more radical departure from what industry is currently using. So the question remains: can we build a vertically composed 3D IC fabric for static CMOS while preserving its routability properties?

In this paper, we present Skybridge-3D-CMOS (S3DC), the first vertically-composed fine-grained CMOS 3D IC technology that also has high degree of routability [13]. It is enabled by a systematic way of designing static CMOS circuits in a skeletonstyle nanowire structure. All the circuits are built on the uniform vertical nanowire template, which is pre-doped with p- and ntype horizontal stripes. To form the pull-up and pull-down networks containing series / parallel connections, series networks are built with devices implemented on one nanowire, and parallel networks are built with devices on different nanowires, following a simple systematic approach. A specially designed fabric component called Skybridge-Interlayer-Connection (SB-ILC) enables connecting the p-type pull-up and n-type pull-down networks together to generate the output signal. Other designed fabric structures enable connectivity between transistors in both vertical and horizontal dimensions - top-level metal layers are often not necessary (and not assumed in this paper). Arbitrary static CMOS gates can be designed following a primarily material deposition focused assembly. The overall manufacturing requirements are not departing from the ones used for the dynamic Skybridge that was discussed in [9]-[12].

To analyze routability in 3D, we look at the *pin access* available for a logic gate in each technology. The pin access of a conventional standard cell layout is the number of points inside a cell where a pin can be placed for inter-cell routing. It reflects

1536-125X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications.standards/publications/rights/index.html for more information.



Fig. 1. Pin accesses of the NAND2X1 in each technology (pin access number can differ in various layouts): A). 2D NAND2X1 has 4 and 6 pin accesses for input A and B; B). T-MI NAND2X1 only has 2 and 3 pin accesses for input A and B due to the smaller footprint and the area occupied by the Monolithic Interlayer Vias (MIVs); C). S3DC has 5 and 9 pin accesses for input A and B in the layout studied.

the ability of routing a gate-level design without seeing pin congestions. If the pin access count is too small, then there is a greater chance that pin access points become occupied without all signals being connected. Congested pins also make the wires connecting pins more congested. Pin access density is also important; it is the total pin access per unit area and shows how many gates can be routed in a unit area without pin congestions. The 2D and layered 3D CMOS directions do gate-level routing

TABLE I
PIN ACCESS AND PIN ACCESS DENSITY OF 2-INPUT NAND IN
VARIOUS IC FABRICS AT 16 NM

| Fabric  | Cell    | Total Pin<br>Access Number<br>per Cell | Pin<br>Density<br>(count per um <sup>2</sup> ) | Footprint (nm <sup>2</sup> ) |
|---------|---------|----------------------------------------|------------------------------------------------|------------------------------|
| 2D CMOS | NAND2X1 | 12–18                                  | 61—92                                          | 1.97e + 5                    |
| T-MI    | NAND2XI | 6–9                                    | 55–83                                          | 1.1e + 5                     |
| S3DC    | NAND2XI | 15–27                                  | 685–1027                                       | 2.2e + 4                     |

by connecting the underlying planar logic cells with the upper wiring layers through the pins. In these cases, the pin access is proportional to the surface area of the logic cells. Inevitably, T-MI has a decreased pin access due to its smaller footprint, and suffers from more pin congestions. On the other hand, in S3DC, although its cell footprint is much smaller than even T-MI, pin access is significantly better. This is due to that the S3DC pin access is no longer limited by the cell footprint, which is enabled by its different routing scheme that uses the vertical dimension better. When the S3DC does the gate-level routing, inter-gate wires access the 3D gate layouts through vertical routing elements in a 3D space, which creates more points of pin access along the vertical axis.

Fig. 1 and Table I illustrate the pin access of a 2-input NAND layout in 2D CMOS, T-MI, and S3DC. Although the pin access count differs somewhat for various layouts, S3DC is clearly more than an order of magnitude better than other technologies in pin access density, and slightly better than 2D CMOS in the pin access count of each input signal. The higher pin access density shows that we are capable of routing more S3DC gates in a unit area without pin congestions. The somewhat better pin access vs. 2D CMOS shows that S3DC has better ability of accessing the cells despite the 9X smaller 2-to-1 NAND cell footprint. This observation is also validated through the benchmarks. No routing congestion is found in the interconnect-dominated LDPC benchmark; in fact, S3DC still has 20% unused routing resources in the most congested layer, which is slightly better than the 2D CMOS design. Important to note that these S3DC designs are routed only with the interconnection components within the nanowire template, without using dedicated metal routing layers on top of logic cells. More details on the routing analysis are included in Section V.

To evaluate S3DC technology against T-MI and 2D CMOS, we also have developed a system-level methodology incorporating commercial CAD tools. While these tools are not yet fully optimized for S3DC they allow us to derive performance metrics for comparison against other fabrics, albeit somewhat conservatively. We have implemented six benchmarks including a 4-bit and a 16-bit multiplier, a 4-bit microprocessor, as well as circuits for LDPC, DES, and JPEG. Routability, as well as key metrics to quantify performance, power, and area are evaluated against both planar CMOS as well as 3D T-MI. In all cases, we employ 16-nm technology node.

In summary, the main contribution of this paper includes:

a) Developing the first fine-grained 3D CMOS IC technology leveraging the vertical dimension



Fig. 2. a) One single nanowire with striped doping; (b). Uniform vertical nanowire template; (c). SB-ILC allows routing between various doping layers without MIVs.

- b) Achieving high routability despite the high density designs
- Adopting a system-level CAD tool suite enabling validation of larger circuits
- d) Detailed quantitative comparison with 2D CMOS as well as state-of-the-art transistor-level monolithic 3D CMOS showing groundbreaking potential vs. state-of-the-art

The rest of the paper is organized as follows. In Section II we introduce S3DC technology features that enable fine-grained 3D-CMOS. In Section III we briefly introduce the S3DC SRAM cell. In Section IV, we introduce the system-level design and evaluation methodology. In Section V, we present the benchmarking results in routability, performance, power, density, and thermal management. In Section VI various aspects of S3DC manufacturing are discussed, including a complete manufacturing pathway, experimental progress, manufacturing cost, and sensitivity analysis on different manufacturing parameters. Section VII concludes the paper.

## II. S3DC 3D-CMOS-ENABLING FEATURES

S3DC is a vertically-composed fine-grained 3D CMOS IC technology. It is enabled by novel fabric concepts.

1) First, all circuits are realized on a uniform vertical silicon nanowire template – shown in Fig. 2(A). We place and connect active devices on these nanowires either in series or in parallel (across multiple nanowires) to build CMOS circuits. Nanowire structures are also utilized to form connectivity – as discussed below. Template nanowires are pre-doped in horizontal stripes enabling Junctionless nanowire device formation during material deposition. This striping is achieved with initial wafer bonding. A thin dielectric layer is deposited on top of the n-



Fig. 3. (a). An n-type V-GAA Junctionless transistor in 16nm S3DC technology; (b). 3D connections within one doping layer realized by Bridges, Coaxial Routings, and routing nanowires; four signals A, B, C, D are carried in this example

doped wafer, then a p-doped silicon layer is transferred onto the deposited dielectric layer using molecular bonding technique [4]. Doping only occurs during template formation. Nanowires are formed through etching after a multi-doped full wafer is created. All benchmarks presented in this paper are based on 16nm-wide nanowires.

2) Parallel networks are built with devices on different vertical nanowires; these different nanowires are shorted together on both drain and source sides. SB-ILC is the structure that connects the p-type pull-up and n-type pull-down networks together to generate the output signal. We also wire several SB-ILCs together to short the nanowires and form a parallel network. The SB-ILC structure is shown in Fig. 2(B). It is designed to provide connection between different doping regions with small parasitic resistance and capacitance. Materials are chosen based on the favored work function: e.g., Ni and Ti are chosen to form good Ohmic contacts with p- and n-doped silicon nanowires, respectively.

S3DC also shares some fabric structures with the initial dynamic Skybridge fabric albeit differently integrated/utilized.

- 1) Uniform Vertical Gate-All-Around (V-GAA) Junctionless transistors: an n-type transistor structure is shown in Fig. 3(A). The source, channel, and drain regions are based on heavily doped vertical nanowires. Carefully selected gate electrodes and dielectric materials are surrounding the nanowire. The V-GAA behavior is modulated by the work function difference between gate electrodes and channels [14].
- 2) Routing Bridges (in Fig. 3(B)) are horizontal metal wires connecting adjacent vertical nanowires.



Fig. 4. S3DC thermal management components: (a). HEJ and HEB; (b). HDPP.

- 3) Routing Nanowires are vertical nanowires (in Fig. 3(B)) that can also act as routing elements since they are heavily doped and also silicided, having high conductivity.
- 4) Coaxial Routing structures (in Fig. 3(B)) are metal layers formed along the vertical nanowires to add connectivity in vertical directions. The inner routing metal layer can be used for noise shielding. We have carefully optimized the material types and geometry parameters so that the metal layer has minor influence on the conductivity of the routing nanowire.
- 5) The intrinsic components for thermal management include Heat Extraction Junctions (HEJs), Heat Extraction Bridges (HEBs), and Heat Dissipating Power Pillars (HDPPs) as shown in Fig. 4. HEJs are specialized junctions that are designed for extracting heat from hot spots on logic nanowires. HEBs connect HEJs on one end and HDPPs on the other, and convey heat flow from HEJs to HDPPs. HDPPs are vertical metal pillars that are larger in area than vertical silicon nanowires, and thus have lower thermal resistance and provide good heat dissipating paths down to the substrate in vertical direction. These structures are inserted during design cycles to improve the heat dissipation from hot regions to the heat sink. More details of these thermal management components can be found in [9], [10].

For additional intuition please see Fig. 5(A). It shows a three-input S3DC NAND gate as an example of a logic-implementing static CMOS circuit utilizing the above concepts. The three p-type transistors on the top are connected at the source side by VDD, and on the drain side by the SB-ILCs. Thus, the pull-up network is parallel. Three n-type transistors at the bottom are connected in series by the vertical nanowire. They form the pull-down network. SB-ILCs connect the pull-up and pull-down



Fig. 5. S3DC 3-in NAND gate layout (dielectric for isolation between components and for structural support not shown): (a). Layout without transistor sizing; (b). Layout with transistor sizing for more balanced pull-up and pull-down network.

networks to generate the output signal, which is conducted out by the Bridges. VDD and GND are delivered to each cell through the Bridges in the top and the bottom layer. These two layers are reserved only for power delivery, which ensures enough resources to deliver power with minimal IR drop.

In S3DC technology, transistor sizing can be achieved by connecting multiple transistors in parallel across neighboring nanowires. This transistor sizing method is similar to FinFETs, and is quantized. For example, in the layout shown in Fig. 5(B), we have improved the drive strength in the pull-down network by replicating the transistor stack in parallel across adjacent n-type nanowires. This way, the pull-up and pull-down networks are more balanced in terms of drive strength at the cost of using more transistors.

Compared with other 3D directions, S3DC has better pin access, improved routing flexibility from its 3D routing structures, and the fine-grained vertically assembled gates. All these benefits together greatly improve the S3DC routability in 3D. Fig. 6 shows the side-view inter-cell routing schematics of a (3, 2) counter in different 3D technologies as an example. The figure gives us a preview on how S3DC makes use of the vertical dimension efficiently to maintain good routability, despite the small footprints of S3DC logic gates. Additional details will be discussed in subsequent sections. Table II provides a comparison between key aspects of S3DC and other 3D directions.



Fig. 6. Inter-cell routing schematic of a (3,2) counter in different 3D IC: (a). G-MI only adds inter-cell connectivity through MIVs, and uses more metal layers since both top and bottom tiers need to be routed; (b). T-MI improves intra-cell connectivity through MIVs but it follows most inter-cell routing conventions; pin and routing congestions are likely due to smaller cell footprints; (c). S3DC's flexible 3D routing allows most wiring done within active layers without severe congestions; (d). Design of the (3,2) counter.

#### III. S3DC SRAM CELL

In this section, we briefly introduce the SRAM cell design in S3DC technology for completeness. The cell design, which is shown in Fig. 7(A) and (B), conforms to the S3DC integration requirement; uniformly sized transistors are placed and routed within the vertical nanowire templates to build the cell. It stores value with cross-coupled inverters as the conventional CMOS SRAM cell usually does. The cell stability is enhanced by

using multiple word-line voltage levels, which has been proved effective by the Wordline Underdrive technique [16]. We apply stronger write and weaker read voltage levels as shown in Fig. 7(C) and (D). In this way, we access the cell value without flipping it during reading, and ensure that the value to be written overpowers the cell value during writing.

# IV. S3DC SYSTEM-LEVEL DESIGN AND EVALUATION METHODOLOGY

In this section, we first introduce CAD tooling and methodology to evaluate the benefits of S3DC vs. state-of-the-art 3D IC and 2D CMOS. Metrics including routability, density, power, and performance are evaluated for several circuits. Here, we are using transistor-level monolithic 3D IC as the baseline.

Fig. 8 shows the system-level design flow for mapping large scale behavioral / RTL-level designs into S3DC physical layouts [17]. This is a standard ASIC semi-custom design flow based on commercial CAD tools.

S3DC technology utilizes the static CMOS circuit style, but it is significantly different from 2D CMOS in physical design. Consequently, S3DC is compatible with non-physical CAD tools doing logic synthesis, timing and power analysis, but the CAD tools that are relevant to physical design are not immediately suitable. In order to make these 2D CAD tools support S3DC designs, we have represented S3DC physical designs in a way that is compatible with the 2D tools – essentially by finding analogous (by function) concepts in 2D physical layouts to the S3DC fabric structures and setting appropriate constraints. This tooling currently supports one layer of S3DC vertical gates; future work will extend to multiple vertically stacked S3DC fabric designs – vertical stacking is limited by the nanowire aspect ratio that with state-of-the-art 50:1 vertical nanowires could be up to two gates vertically [9], [18]. Details are described as follows.

- i) A key observation is that S3DC routing fabric components are mappable to the metal layers in 2D tools; components at different nanowire heights are treated as in different metal layers defined in the 2D tools. For example, as shown in Fig. 9(A), the GND contact at the bottom of the nanowire and the SB-ILC that carries the output signal are represented as lying in the M1 and M5 layers, respectively, in the 2D tools.
- Bridges provide horizontal connections, so they can be similarly treated as metal wires in 2D tools since they have similar functions.
- iii) Routing Nanowires and Coaxial Routing structures carry signals in the vertical direction. They can be treated as vias in 2D tools.
- iv) Transistors occupy space and prohibit other routing structures from passing by, and thus can be represented as equivalent to routing blockages in 2D tools.

The rest of the section includes more details the design flow.

# A. S3DC Fabric Components Characterizations

We have validated and characterized the core fabric components including SB-ILC, fabric Ohmic contacts, Coaxial Routing structures and V-GAA Junctionless transistors with 3D

|                    | Parallel 3D                                                         | Monolithic 3D                                                       | True 3D w/ S3DC                                                                                                            |
|--------------------|---------------------------------------------------------------------|---------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| Routing<br>Element | Uses conventional 2D routing elements, added connectivity from TSVs | Uses conventional 2D routing elements, added connectivity from MIVs | Full 3D connectivity (vertical nanowire,<br>Coaxial Routing and Bridges within one<br>active layer, SB-ILC between layers) |
| Pin Access         | Pin access limited by cell surface                                  | Decreased pin access, limited by cell surface                       | Improved pin access from its 3D routing scheme                                                                             |
| Granul-arity       | Coarse-grained (limited by TSV alignment [1])                       | Finer-grained (Cell- or transistor-level [15], layer-by-layer)      | Vertically-composed fine-grained<br>(transistor stacking within one active layer)                                          |
| Process            | Separate process for each layer                                     | Layer-layer process, each layer doping                              | Processed as a single wafer                                                                                                |

TABLE II
PARALLEL, MONOLITHIC 3D, Vs. S3DC COMPARISON

Sentaurus TCAD tools. The tools simulate both the process and device physics with nanoscale effects taken into account. TCAD simulations of V-GAA Junctionless transistors show an on-current of 17  $\mu \rm A$  and an on-off ratio of 1.7e + 5 for n-type, and an on-current of 16  $\mu \rm A$  and an on-off ratio of 2.1e + 4 for p-type transistors. Simulation results of SB-ILC have proved that it provides good Ohmic contacts between different doping regions of a nanowire.

The characterization results of these fabric components are then modeled to be used in circuit-level simulations. The IV characteristics were then analyzed by DataFit [19] with regression analysis and polynomial fitting to acquire the mathematical equations, which describe the device characteristics. These models are then used to build the behavioral HSPICE models.

#### B. Cell RC Extraction

We have manually designed the standard cell layouts including logic gates, a buffer, and a flip flop, following the S3DC technology design rules [9]. We visualized the layouts with the 3D drawing tool SketchUp. RC extractions were manually done using the Predictive Technology Interconnect Models [20], following the dimensions and material types of the structures in the layouts. Physical HSPICE netlists were then built following the circuit topology and the extracted RC.

# C. Characterization and Abstraction of Standard Cells

Synopsys SiliconSmart took the device models and the physical HSPICE netlists as the inputs, and performed power and timing characterization for each standard cell. These results have been written into a cell library file (LIB file), which is used during the later design and evaluation stages.

The cell Library Exchange Format (LEF) files, called cell abstracts, are used in Encounter-based cell-to-cell routing. They contain cell layout information including the dimensions of each cell, the location, layer and dimensions of the pins, and the descriptions of obstructions (the used metal layers / shapes for intra-cell wiring). Fig. 9(B) and (C) show the layout design and its LEF abstract of a 3D 3-input NAND gate. Although cell LEF file is originally designed for describing cell layouts in 2D CMOS technology, it can still represent S3DC cell layouts in the following way:

 The dimensions of cells in LEF represent the footprint of the S3DC cell layouts.

- ii) The pin access positions and dimensions in cell LEF files describe the positions / dimensions of the Coaxial Routing structures carrying the I/O signals as shown in Fig. 9(C). In S3DC, the I/O Coaxial Routing structures in the layout are accessible from multiple layers, so the corresponding pins in cell LEF files are simultaneously defined in several metal layers. For example, as shown in Fig. 9(C), input C is accessible at five different heights in the S3DC layout, so the pin C in cell LEF is defined in five layers.
- iii) Transistors, Ohmic contacts, and intra-cell wiring structures are all represented as obstructions in the cell LEF files since they all prohibit cell-to-cell routing from passing by.

#### D. Imitating Cell-to-Cell Routing in Encounter

Cadence Encounter is designed to implement the 2D CMOS layouts. It treats each standard cell as a black box, only knowing its cell dimensions, and pin and obstruction information from the cell LEF files; it places the cells and routes the nets in such that performance, power, and area are optimized. To make Encounter generate correct S3DC physical designs, in addition to the aforementioned ways to represent S3DC designs in 2D tools, as is shown in Fig. 9(D), we have added two constraints of inter-cell routing to imitate the S3DC routing style:

- i) In S3DC, nanowires are uniformly distributed in an array. The vertical routing, including using Routing Nanowires and Coaxial Routing structures, can only be achieved along these uniformly-distributed nanowires. Consequently, the vias representing these S3DC vertical routing elements in Encounter are only allowed to be placed where the nanowires are positioned in the nanowire array template.
- ii) The Bridges connect the nanowires and thus are only placed along the tracks defined by the rows / columns of nanowires. So in 2D tools the wires representing these Bridges should only be allowed on the discrete tracks separated by the nanowire pitch in the S3DC template.

All these constraints can be defined in the technology LEF file, which contains the routing rules. Other parameters, including design rules, are also captured in the technology LEF and TCH files. The TCH file sets the inter-cell RC extraction rules, and is generated by Cadence Techgen based on the metal layer design rules. With the cell LEF file, the technology LEF file,



Fig. 7. S3DC SRAM: (a). 6T S3DC SRAM cell schematic; (b). 6T S3DC SRAM cell layout (c). Write operation: write-access n-type transistor strongly turned on to overpower the feedback inverter; (d). Read operation: read-access p-type transistor weakly turned on to maintain cell stability during read.



Fig. 8. S3DC device-to-system design flow.



Fig. 9. S3DC layout description in 2D CAD tools: (a) AOI2X1 S3DC layout and its fabric components at different nanowire heights described as in different metal layers in 2D tools; (b) NAND3X1 S3DC layout and its (c) Cell abstract; (d) Encounter routing constrains imitating the S3DC routing styles; red squares corresponds to the positions of vertical nanowires.

and the TCH file, Encounter can imitate the S3DC physical design style, and do the placement and routing for S3DC designs. We have measured the area / footprint of the designs from the layouts in Encounter. It also generates the SPEF file, which captures the inter-cell routing RC information of the physical implementation.

Although Encounter can generate correct S3DC physical designs, it is still incompatible with some S3DC features, and thus leads to suboptimal S3DC physical designs. For example, S3DC can route two signals vertically through the Routing Nanowire and the Coaxial Routing structure along one nanowire, while one via in Encounter can only carry one signal; moreover, S3DC can stack two gates vertically, but we were only implementing one layer of gates in Encounter.

#### E. Evaluation of Performance, Power, and Area

To evaluate performance and power metrics, we performed timing and power analysis with Synopsys PrimeTime. Prime-Time mainly took two input files, including the LIB file containing the timing and power characterization results of S3DC standard cell layouts, and the SPEF file capturing the inter-cell routing information.

#### V. S3DC EVALUATION RESULTS

In this section, we first analyze the routability of different technologies. Then we present other key performance, power, and density metrics. We benchmark six designs including 4- and 16-bit array-based multiplier, a 4-bit WISP-4 microprocessor, LDPC, DES, and JPEG [21] in 16-nm technology node.

## A. Routability Analysis

As we have mentioned before, T-MI has severe pin access issues. The pin congestions also make the wires that need to access these pins very congested as well. On the other hand, the S3DC technology avoids these routability issues. To quantify the routing congestion / routability, we need to understand that routing congestions happen when the routing demand exceeds the available routing resources. Consequently, to analyze the routability benefits of S3DC, we have used the metric of the ratio of routing demand to routing resource [17].

We estimate the routing demand by using its relationship with the cell density G per unit square area

$$l \sim G^{r-0.5} (r > 0.5)$$

where l represents the routing demand, G represents the number of cells that should be routed per unit square area and r is the Rent's exponent. In this r is set to be 0.75, following the typical value for large-scale static-CMOS circuit designs [22]. To calculate G, we have used Rent's rule, a well-known empirical relationship between the required terminal count of a design block and the number of cells in the block. This rule is applicable to technologies that route the design by connecting cells with inter-cell nets, including 3D technologies [11], [23]. The Rent's rule can be represented as

$$E = A \cdot G^r$$



Fig. 10. Routing demand / resource ratio of LDPC in all technologies.

so G can be calculated as

$$G = \left(\frac{E}{A}\right)^{\frac{1}{r}}$$

where A is the average terminal count per cell, and is set to be 3 for all the technologies [22]. E is the number of terminals per unit square area. In planar CMOS and T-MI technology, Cadence Encounter reports the pin density E. In S3DC technology, as the pins access the 3D gate layouts from multiple layers, the terminal count in each layer,  $E_{S3DC}$ , is effectively

$$E_{S3DC} = E_{ENC} \cdot \frac{1}{N}$$

where  $E_{ENC}$  is the total pin density across all pin layers, which is reported by Encounter, and N is the number of layers that are reserved for pin accesses in S3DC standard cell design. Routing resources of each technology can be estimated by multiplying the design footprints by the number of routing tracks per unit area.

We have analyzed the routability using the results of six benchmarks. Fig. 10 shows the normalized ratios of routing demand to routing resources in all metal layers in LDPC benchmark. We are showing the results of the LDPC benchmark since it is interconnect dominated and thus reflects well the routability of a technology. From the results we can see that the T-MI LDPC design does most routing in the lower metal layers, and thus has severe routing congestions in the M1, M2 and M3 layers. By contrast, in the S3DC, the routing demand distributes more evenly in all wiring layers, and makes better utilization of the upper layers that are usually not well used in 2D CMOS and T-MI technologies. The demand/resource ratio is at most 0.8 in S3DC, which is lower than in the other technologies. Ratios of upper layers are higher, but still well below 1, meaning that upper layers in S3DC are better used without introducing congestions in these layers.

# B. Power, Performance and Density Results

The power, performance and area have also been evaluated based on the six benchmarks. The methodology introduced in Section IV is followed to generate the S3DC designs. Further manual optimizations have been done on the generate

| Benchmark          | Technology | Cell Count | Best Frequency (GHz) | Total Wirelength (mm) | Total Power (mW) | Footprint   | PPA   |
|--------------------|------------|------------|----------------------|-----------------------|------------------|-------------|-------|
| 4-bit Multiplier*  | 2D         | 102        | 4.26                 | 0.31                  | 3.31E-3          | 1.00        | 1.00  |
| _                  | T-MI       | 96         | 4.63 (+9%)           | 0.22(-29%)            | 2.75E-3 (-17%)   | 0.48 (-52%) | 2.51  |
|                    | S3DC       | 118        | 4.55 (+7%)           | 0.04 (-87%)           | 7.46E-4 (-77%)   | 0.03 (-97%) | 145   |
| 16-bit Multiplier* | 2D         | 2281       | 3.86                 | 5.26                  | 0.29             | 1.00        | 1.00  |
| -                  | T-MI       | 2158       | 4.37 (+13%)          | 3.63 (-31%)           | 0.23 (-20%)      | 0.48 (-52%) | 2.60  |
|                    | S3DC       | 2484       | 4.54 (+18%)          | 0.58 (-89%)           | 5.21E-2 (-82%)   | 0.03 (-97%) | 185   |
| WISP-4 Processor*  | 2D         | 339        | 3.82                 | 0.63                  | 0.02             | 1.00        | 1.00  |
|                    | T-MI       | 324        | 4.17 (+9%)           | 0.46 (-27%)           | 0.017 (-18%)     | 0.5 (-50%)  | 2.44  |
|                    | S3DC       | 363        | 4.55 (+19%)          | 0.18 (-71%)           | 4.08E-3 (-80%)   | 0.04 (-96%) | 125   |
| DES                | 2D         | 52380      | 4.6                  | 99.00                 | 3.96             | 1.00        | 1.00  |
|                    | T-MI       | 51450      | 5.3(+15%)            | 71.28(-28%)           | 3.30 (-17%)      | 0.49(-51%)  | 2.46  |
|                    | S3DC       | 53450      | 4.1(-12%)            | 30.69(-69%)           | 1.25 (-66%)      | 0.10(-90%)  | 24.51 |
| LDPC               | 2D         | 36890      | 1.9                  | 616.72                | 3.40             | 1.00        | 1.00  |
|                    | T-MI       | 34780      | 2.2(+17%)            | 413.20(-33%)          | 2.66(-22%)       | 0.50(-50%)  | 2.56  |
|                    | S3DC       | 37689      | 1.7 (-10%)           | 123.3(-80%)           | 1.16(-62%)       | 0.11(-89%)  | 26.74 |
| JPEG               | 2D         | 297028     | 1.2                  | 600.29                | 9.24             | 1.00        | 1.00  |
|                    | T-MI       | 287986     | 1.37(+14%)           | 426.21(-29%)          | 7.65(-20%)       | 0.48(-52%)  | 2.54  |
|                    | S3DC       | 299076     | 1.1(-8%)             | 180.08(-70%)          | 3.47(-61%)       | 0.11(-89%)  | 24.57 |

TABLE III SYSTEM-LEVEL BENCHMARKING RESULTS

ated S3DC physical designs to realize more features in S3DC that cannot be implemented with the automatic CAD flow. To name a few manual optimizations: we implement S3DC designs using two vertically-stacked layers of gates; two signals can now be vertically carried at the same time on the same nanowire, by using the inner nanowire and the outer coaxial metal layer; some of the inter-cell wires have been customized to fully use the 3D space and make the layout more compact. We have optimized the 4-/16-bit multipliers and the WISP-4 designs. The other three benchmarks are prohibitively large for manual optimizations. Consequently, in their subsequent designs/implementations, suboptimal rules described in Section IV-D are followed: only one gate layer has been utilized; only one signal can be vertically carried on one nanowire due to the inability of representing the connections of both Coaxial Routing structure and routing nanowire with one via in the Cadence Encounter tool.

We have measured the best operating frequencies and total power consumptions with PrimeTime. The total power was measured with operating frequency of 1 GHz and input activity factor of 0.1 for all designs. The Encounter reported the footprint and total wirelength of each design. We have also included a metric called PPA (power, performance, area), which comprehensively evaluates the efficiency of a design with the expression "clock frequency / (power \* footprint)". All the results are shown in Table III.

The normalized footprint results show that S3DC technology leads to ultra-high density designs. The density benefits are from 9X to 40X when compared with 2D CMOS, and from 4X to 19X when compared with T-MI. The large total wirelength reduction in S3DC also contributes to great saving in power consumption, that are 56%–77% smaller than with T-MI. Compared with T-MI using the state-of-art FinFETs, the best frequencies of S3DC designs are in the range of 20% loss to 9% benefit. Among these results, DES, LDPC, and JPEG are sub-optimal and have worse results than the others since their

TABLE IV WORST-CASE HOT SPOT TEMPERATURE

|                      | Inverter | 2-in NAND | 3-in NAND | 4-in NAND |
|----------------------|----------|-----------|-----------|-----------|
| No Heat Extraction   | 2631K    | 1711K     | 1569K     | 1367K     |
| With Heat Extraction | 384K     | 374K      | 368K      | 364K      |

automatically-generated designs have not been manually optimized. Also, the S3DC transistors can be further optimized to achieve higher performance. The PPA benefits of S3DC range from 9.7X to 71.1X when compared with T-MI, showing that S3DC technology has one to two orders of magnitude benefits in overall efficiency in the circuits studied.

# C. Thermal Management Evaluation

We have performed an evaluation of our S3DC thermal management approach to show that S3DC technology can effectively manage the thermal profile despite its ultra-high density.

The S3DC thermal management was evaluated with analogous analysis in the electrical domain [24]. Equivalent thermal resistance models for transistors and logic-implementing nanowires following similar principles in reference [10] have been developed. Next, we built benchmark circuits in scenarios where two layers of various kinds of S3DC gates are stacked on one nanowire, and completed HSPICE simulations for worst-case heat dissipation scenarios where the transistors generate most total heat. We measured the highest temperature in each layout as shown in Table IV.

As we can see from the results, due to the high density, long thermal paths, as well as surface scattering and confinement effects, which reduce the thermalconductivity of thin nanowires, S3DC circuits without fabric-level thermal management can reach very high temperatures. With HEJ (one for each gate) and HDPP placed in the circuits, hot spot temperature reduces by up

<sup>\*</sup>Benchmarks followed by an asterisk are manually optimized for S3DC physical designs.



Fig. 11. S3DC transistor fabrication: (a) Starting nanowire; the heavily-n-type-doped region for building n-type transistors; (b) HfO2 ALD for the gate dielectric formation; (c) Selective material deposition (TiN in this case) for gate electrode formation; (d) Insulator deposition and planarization; (e) Isotropic HfO2 etching; (f) More transistors sequentially stacked on one active layer.

to 85%. Although we conservatively assumed that no gate input / output wires provide heat dissipation, critical temperature is reduced to 384K, which is below the threshold temperature for modern microprocessors [25], and indicates the effectiveness of intrinsic heat management fabric components.

The increased overall power density of the chip also requires more heat to be dissipated by the cooling system. As the benchmark results in Table III show, the power density of S3DC circuits increases by 3.1–7.4X compared with 2D CMOS, and by 1.8–4.3X compared with T-MI. The increased power density somewhat widens the gap between the chip power density and the heat flux that forced-air cooling system can dissipate [26]. Large heat sinks, switching to liquid cooling, or adopting other high heat flux cooling methods such as microchannel [27] and microjet impingement [28] may be employed in emerging 3D IC technologies such as S3DC.

# VI. S3DC TECHNOLOGY MANUFACTURING DISCUSSIONS

In this section, the S3DC manufacturing pathway is introduced, and the manufacturing feasibility of S3DC fabric is discussed including highlighting related experimental demonstrations.

#### A. S3DC Manufacturing Pathway

Fig. 11 shows the manufacturing pathway of an S3DC Vertical Gate-All-Around Junctionless transistor. As we can see, it is based on multi-layer material insertion to functionalize a uniform nanowire template. In S3DC one processes an IC as a single wafer in contrast to the parallel / monolithic 3D integration, which manufactures circuits in a layer-by-layer manner.

Furthermore, S3DC fabric does not involve a selective doping process after the nanowire template formation; in monolithic 3D IC, however, doping is necessary for fabricating each IC layer, which may harm the bottom layer circuits due to the high temperature dopant activation process. The S3DC manufacturing pathway allows the stacking of multiple components, such as transistors, contacts and metal routing structures within one doping layer of nanowires as we can see from previously shown circuit layouts. It also shifts the lithography precision requirement to material deposition, which is known to be controllable more precisely (and thus even could alleviate the lithography-imperfection-induced variations).

#### B. Experimental Demonstrations

S3DC IC manufacturing generally includes two types of process steps: the uniform vertical nanowire template formation and multi-level selective material deposition. Therefore, a validation of these two major steps is helpful to demonstrate the manufacturability of S3DC technology.

In order to form the template, firstly, one wafer containing several layers with p and n doping profiles is achieved by bonding individual p and n silicon wafers. Then vertical nanowires are achieved in the top-down manner by applying high aspect ratio anisotropic silicon etching to the prepared layered wafer. Every step during this template formation process has been demonstrated: wafer bonding technology has been widely used in current monolithic 3D integration and widely demonstrated [4]; vertical nanowire patterning can be achieved through processes such as Bosch Process [29], Inductively Coupled Plasma etching (~50:1 aspect ratio, 5nm dimension shown) [18], etc., and has been experimentally demonstrated in our group as shown in Fig. 12(A).

Following the nanowire patterning, multi-level selective material deposition functionalizes the template. Similarly, with the deposition techniques in CMOS process, selective material deposition in S3DC manufacturing involves steps including lithography, planarization, deposition, lift-off, etc. Among these steps, planarization in S3DC is more challenging since the conventional Chemical Mechanical Polishing (CMP) process could cause structural damage to the vertical nanowires. Consequently, an alternative technique with etch-back on self-planarization material is used in S3DC. This technique planarizes the photoresist surface by coating thick self-planarizing resist (SU-8) layer to completely cover the nanowires and then etching the photoresist layer back to the desired thickness. This approach has been experimentally demonstrated in our group [30]. All the other steps of material deposition can be done similarly to conventional CMOS manufacturing. Relying on the new planarization technique, precisely-controlled selective material depositions (various kinds of metal and oxide) in the S3DC nanowire template can be achieved and are shown in Fig. 12(B). While all critical process steps have been validated, our longer-term (multiyear) goal is to attempt a simple S3DC circuit, with collaborators, as we gradually refine the individual process steps involved.



Fig. 12 See [30]. Cleanroom validations for S3DC manufacturability: (a) Vertical nanowire template demonstration: nanowires with different widths from 26 nm–200 nm (top figures) and with mostly uniform 197 nm width and 1100 nm height (bottom figures), masks defining nanowires are colored in red; (b) Metal-silicon contact as a demonstration of selective anisotropic metal deposition, masks defining nanowires are colored in red, contacts are colored in green.

# C. Manufacturing Cost Discussion

In this section, we briefly discuss manufacturing cost implications of S3DC circuits, and compare these aspects with other 3D technologies. Also, we discuss options to decrease the production cost of S3DC circuits.

The manufacturing cost per transistor is a useful metric to evaluate the cost of a technology. With a lower cost per transistor, we can manufacture a chip that realizes a given functionality at a lower cost. Compared with FinFET-based technologies, S3DC has much simpler Front End of Line (FEOL) process, only involving two selective deposition steps as shown in our envisioned manufacturing pathway. On the other hand, state-of-art FinFETs require very complex device engineering steps, including fin patterning, several doping steps (for channel, halo / extensions, and heavily doped source / drain), spacer deposition, the deposition and removal of

dummy gate stack, and the formation of replacement gate stack and so on. The simpler S3DC device-building process is a great advantage over the monolithic 3D technology that uses FinFETs when comparing the manufacturing cost per transistor.

Another potential advantage of S3DC technology is its less stringent constraints on lithography and overlay precision requirements. First, in the S3DC manufacturing pathway, the transistor channel length is defined by the thickness of deposited gate material. This approach shifts the lithography precision requirement to material deposition, which is known to be precisely controllable at a lower cost. Moreover, during each process step, we project that S3DC technology is likely to suffer less from the yield loss caused by the mask misalignment. This is due to the use of regular structures in S3DC layouts. Although not yet proven in S3DC technology, we had evaluated NASIC technology in our previous work [31], which has 2D grid-based nanowire structures. It was shown that periodic regular structures tend to not impose stringent constraints on overlay precision requirements. The comprehensive study on the yield loss of S3DC and other 3D integration technologies is an on-going project in our group.

Also, as traditional CMOS technology scaling by shrinking the devices approaches fundamental limits, the production of 2D ICs will become more and more expensive, and eventually too difficult to realize. Consequently, although scaling towards 3D by adding more layers may seem to be expensive in current technology nodes, it may become inevitable and possibly more economical than 2D scaling in future technology nodes.

One of the drawbacks of S3DC technology is its large quantity of process steps. This could potentially slow down the production of each chip. Several methods can be used to mitigate these drawbacks. For example, we can decrease the number of manufacture steps by only using one layer of logic gates (up to 8 stacked transistors) and still achieve significant benefits, which has been demonstrated from the DES, LDP, and JPEG results in Table III. Also, as S3DC benefits are mainly from vertical scaling, we can relax the precision requirement on lithography techniques to reduce the cost.

# D. Sensitivity Analysis on Nanowire Profile Variation

The nanowires in S3DC are formed by vertical patterning. As we can see from our experimental validation results in Fig. 12, the bottom regions of nanowires are often wider than the top, forming a tapered nanowire profile. This tapered nanowire profile has also been found in reference [18]. The different nanowire diameters lead to variations in S3DC transistors, and influence the S3DC circuits. We have evaluated the effects of such nanowire geometry on S3DC circuits. The nanowire configuration considered for this study is shown in Fig. 13.

As is shown in the figure, the nanowire width gradually decreases from the bottom region (32 nm) to the top (16 nm). We assume that the bottom two n-type transistors have 32 nm widths, followed by two 22 nm-wide n-type transistors and four 16nm-wide p-type transistors on the top. To ensure proper on-off ratio, we used a doping concentration of 1E  $\pm$  18 for the 32 nm and 22 nm transistors, which was chosen based on TCAD



Fig. 13. Scenarios of sensitivity analysis on nanowire profile variation. (a) Side view of 4-input NAND gate layout on tapered nanowires; (b) Side view of 4-input NOR gate layout on tapered nanowires.

simulation results. This optimization would not introduce much additional complexity since it would be coarse-grained and at the wafer level. We have chosen 4-input NAND and 4-input NOR gates as examples since the nanowire variation influences as many as four transistors in these layouts.

To analyze these scenarios, first, we have performed TCAD simulations for transistors with various widths. Compared with 16 nm n-type transistors, 32 nm n-type transistors have comparable characteristics, while 22 nm n-type transistors have higher threshold voltage and lower on-current. The device characteristics from the simulations were then modeled following the methodology in Section IV-A. Physical-level HSPICE netlists were built for the two circuit layouts shown in Fig. 13. Fig. 14 shows the simulation waveforms of the transitions with critical delays. As expected, the tapered nanowire profile leads to performance degradation; the critical delay increased from 24 ps to 37 ps for 4-input NAND gate, and from 28 ps to 33 ps for 4-input NOR gate. The power consumption at best frequency of the tapered nanowire case is 29% lower for NAND gate and 14% lower for NOR gate, when compared with the circuits built on uniform nanowires. Density is expected to decrease by 17%, as the nanowire pitch needs to increase to maintain enough space at the bottom of the nanowires.



Fig. 14. HSPICE simulation results showing impact of nanowire profile variation. (a) Waveform of 4-input NAND gate; (b) Waveform of 4-input NOR gate.

(b)

# E. Sensitivity Analysis on Coaxial Routing Structure Designs

The design of S3DC interconnection components can also influence the behavior of S3DC circuits. We have explored the sensitivity of the S3DC circuits on different designs of S3DC Coaxial Routing structures, with various geometry parameters and material choices.

The Coaxial Routing structure can affect the conductivity of the surrounded inner silicon nanowire, since the inner metal layer and the doped nanowire form a metal-dielectric-silicon structure. The strength of this effect largely depends on the dielectric layer. The dielectric layer can be implemented with different geometry parameters and material types. We have explored the options of using SiO<sub>2</sub> or C-SiO<sub>2</sub> (low-*k* dielectric) [32] as dielectric materials with the layer thickness of 4 nm, 7 nm, and 10 nm.

To evaluate the influence of various Coaxial Routing structure designs on S3DC circuits, first we have characterized the different designs using TCAD simulations and modeled the nanowire resistance. Then we did circuit-level evaluations by performing HSPICE simulations. The impact of Coaxial Routing structures on the nanowire resistance is proportional to the length of the nanowire being covered by the coaxial metal layer. Hence, to show the worst-case impact of the Coaxial Routing structures on the circuits, the circuit layout was designed in the way that the coaxial metal layers cover the majority of the length of the vertical nanowire.



Fig. 15. Sensitivity analysis on various Coaxial Routing design rules. (a) IV characteristics of Coaxial Routing structures (100nm long) with different design rules (when inner metal layer used for noise shielding) (non-linear IV due to yelocity saturation); (b) Scenario of circuit-level Coaxial Routing structure analysis; (c) Waveforms of circuit-level simulation results.

The evaluation results are shown in the Fig. 15. Fig. 15(A) shows the IV curve of 100 nm-long nanowires surrounded by various designs of Coaxial Routing structures. The nanowire resistance has increased by 24%-80% compared with the intrinsic nanowire resistance. The structure we have been using in our circuit designs, with 7 nm C-SiO<sub>2</sub> dielectric layer, led to a 29% increase in nanowire resistance. The established scenario for circuit-level evaluation is shown in Fig. 15(B), and (C) shows the waveforms of the HSPICE simulation. From the results, we can see that the Coaxial Routing structures have increased the delays due to the larger nanowire resistance and load capacitance. Compared with the case when the nanowire is not surrounded by the Coaxial Routing structures, the design with 7nm C-SiO<sub>2</sub> dielectric layer has increased the delay from 14 ps to 18 ps. Also, the structures with thick 10 nm C-SiO<sub>2</sub> dielectric layer led to negligible performance loss, but had an 8% density penalty. On the other hand, the structures with the thin 4 nm dielectric layers led to too much performance degradation. Consequently, by using the Coaxial Routing structures with 7 nm C-SiO<sub>2</sub> dielectric layers, S3DC circuits can have more resources for inter-cell vertical routing, and only minor performance implications for logic cells. Nevertheless, other design points are also valid and can be chosen depending on end-user objectives.

#### VII. CONCLUSION

This paper presents a fine-grained 3D CMOS IC technology based on a vertical nanowire template structure. S3DC provides better routability than state-of-art monolithic 3D approaches. Routing analysis has shown that S3DC eliminates the routing congestions in all benchmarks studied. A system-level S3DC design and evaluation methodology using commercial CAD tools has been developed. The yielded benefits in large-scale benchmarks are found to be very significant vs. the most fine-grained monolithic 3D integration direction, e.g., 9.7 to 71X PPA improvement is noted for the benchmarks studied. Core fabric components have been validated with both detailed simulation and experiments.

#### REFERENCES

- J. A. Burns et al., "A wafer-scale 3-D circuit integration technology," IEEE Trans. Electron Devices, vol. 53, no. 10, pp. 2507–2516, Sep. 2006.
- [2] J. Van Olmen et al., "3D stacked IC demonstration using a through silicon via first approach," in Proc. IEEE Int. Electron Devices Meeting, San Francisco, CA, USA, 2008, pp. 1–4.
- [3] M. Motoyoshi, "Through-silicon via (TSV)," Proc. IEEE, vol. 97, no. 1, pp. 1–4, Jan. 2009.
- [4] P. Batude et al., "Advances in 3D CMOS sequential integration," in Proc. IEEE Int. Electron Devices Meeting, Washington, D.C., USA, 2009, pp. 1–4.
- [5] Y.-J. Lee, P. Morrow, and S. K. Lim, "Ultra high density logic designs using transistor-level monolithic 3D integration," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Des.*, San Jose, CA, USA, 2012, pp. 539–546.
- [6] M. S. Ebrahimi et al., "Monolithic 3D integration advances and challenges: From technology to system levels," in Proc. SOI-3D-Subthreshold Microelectron. Technol. Unified Conf., Millbrae, CA, USA, 2014, pp. 1–2.
- [7] M. M. Shulaker et al., "Monolithic 3D integration of logic and memory: Carbon nanotube FETs, resistive RAM, and silicon FETs," in Proc. IEEE Int. Electron Devices Meeting, San Francisco, CA, USA, 2014, pp. 27.4.1–27.4.4.
- [8] S. Panth, S. Samal, Y. S. Yu, and S. K. Lim, "Design challenges and solutions for ultra-high-density monolithic 3D ICs," in *Proc. SOI-3D-Subthreshold Microelectron. Technol. Unified Conf.*, Millbrae, CA, USA, 2014, pp. 1–2.
- [9] M. Rahman, S. Khasanvis, J. Shi, M. Li, and C. A. Moritz, "Skybridge: 3-D integrated circuit technology alternative to CMOS," Apr. 2014. [Online]. Available: http://arxiv.org/abs/1404.0607
- [10] M. Rahman, S. Khasanvis, J. Shi, M. Li, and C. A. Moritz, "Architecting 3-D integrated circuit fabric with intrinsic thermal management features," in *Proc. IEEE/ACM Int. Symp. Nanoscale Archit.*, Boston, MA, USA, 2015, pp. 157–162.
- [11] S. Khasanvis, M. Rahman, M. Li, J. Shi, and C. A. Moritz, "Architecting connectivity for fine-grained 3-D vertically integrated circuits," in *Proc. IEEE/ACM Int. Symp. Nanoscale Archit.*, Boston, MA, USA, 2015, pp. 175–180.
- [12] M. Rahman, S. Khasanvis, J. Shi, M. Li, and C. A. Moritz, "Fine-grained 3-D integrated circuit fabric using vertical nanowires," in *Proc. Int. 3D Syst. Integr. Conf.*, San Francisco, CA, USA, 2015, pp. TS9.3.1–TS9.3.7.
- [13] M. Li, J. Shi, M. Rahman, S. Khasanvis, S. Bhat, and C. A. Moritz, "Skybridge-3D-CMOS: A vertically-composed fine-grained 3D CMOS integrated circuit technology," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI*, Pittsburgh, PA, USA, 2016, pp. 403–408.

- [14] C.-W. Lee, A. Afzalian, N. D. Akhavan, R. Yan, I. Ferain, and J.-P. Colinge, "Junctionless multigate field-effect transistor," *Appl. Phys. Lett.*, vol. 94, no. 5, Feb. 2009, Art. no. 053511.
- [15] C. Liu and S. K. Lim, "A design tradeoff study with monolithic 3D integration," in *Proc. IEEE Int. Symp. Qual. Electron. Des.*, Santa Clara, CA, USA, 2012, pp. 529–536.
- [16] E. Karl et al., "A 4.6GHz 162Mb SRAM design in 22 nm tri-gate CMOS technology with integrated active VMIN-enhancing assist circuitry," in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, 2012, pp. 230–232.
- [17] J. Shi, M. Li, S. Khasanvis, M. Rahman, and C. A. Moritz, "Routability in 3D IC design: Monolithic 3D vs. skybridge 3D CMOS," in *Proc. IEEE/ACM Int. Symp. Nanoscale Archit.*, Beijing, China, 2016, pp. 145–150.
- [18] M. M. Mirza *et al.*, "Nanofabrication of high aspect ratio (~50:1) sub-10 nm silicon nanowires using inductively coupled plasma etching," *J. Vacuum Sci. Technol. B*, vol. 30, no. 8, Sep. 2012, Art. no. 06FF02.
- [19] "DataFit." Oakdale Engineering, Oakdale, PA, USA, 2013. [Online]. Available: http://www.oakdaleengr.com/datafit.htm
- [20] "PTM RC Interconnect Models," Nanoscale Integration and Modeling (NIMO) Group, Arizona State University, Tempe, AZ, USA,2005. [Online]. Available: http://ptm.asu.edu
- [21] "OpenCores," 2009. [Online]. Available: http://opencores.org.
- [22] P. Saxena, R. S. Shelar, and S. S. Sapatnekar, Routing Congestion in VLSI Circuits: Estimation and Optimization, New York, NY, USA: Springer-Verlag, 2007.
- [23] A. Rahman and R. Reif, "System-level performance evaluation of three-dimensional integrated circuits," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 8, no. 6, pp. 671–678, Dec. 2000.
- [24] B. Swahn and S. Hassoun, "Electro-thermal analysis of multi-fin devices," IEEE Trans. Very Large Scale Integr. Syst., vol. 16, no. 7, pp. 816–829, Jun. 2008.
- [25] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, "Temperature-aware computer systems: Opportunities and challenges," *IEEE Micro*, vol. 23, no. 6, pp. 52–61, Jan. 2003.
- [26] M. L. Minges, "Design considerations," in *Electronic Materials Hand-book: Packaging*. Boca Raton, FL, USA: CRC Press, 1989, pp. 408–421.
- [27] J. M. Koo, S. Im, L. Jiang, and K. E. Goodson, "Integrated microchannel cooling for three-dimensional electronic circuit architectures," *J. Heat Transf.*, vol. 127, pp. 49–58, Jan. 2005.
- [28] J. S. Bintoro, A. Akbarzadeh, and M. Mochizuki, "A closed-loop electronics cooling by implementing single phase impinging jet and mini channels heat exchanger," *Appl. Therm. Eng.*, vol. 25, no. 17, pp. 2740–2753, Jun. 2005.
- [29] B. Yang, K. D. Buddharaju, S. H. G. Teo, N. Singh, G. Q. Lo, and D. L. Kwong, "Vertical silicon-nanowire formation and gate-all-around MOSFET," *IEEE Electron Device Lett.*, vol. 29, no. 7, pp. 791–794, Jul. 2008.
- [30] M. Rahman, J. Shi, M. Li, S. Khasanvis, and C. A. Moritz, "Manufacturing pathway and experimental demonstration for nanoscale fine-grained 3-D integrated circuit fabric," in *Proc. IEEE Nanotechnol.*, Rome, Italy, 2015, pp. 1214–1217.
- [31] P. Vijayakumar, P. Narayanan, I. Koren, C. M. Krishna, and C. A. Moritz, "Impact of nanomanufacturing flow on systematic yield losses in nanoscale fabrics," in *Proc. IEEE/ACM Int. Symp. Nanoscale Archit.*, San Diego, CA, USA, 2011, pp. 181–188.
- [32] T. Gupta, "Dielectric materials," in Copper Interconnect Technology. New York, NY, USA: Springer-Verlag, 2009, pp. 67–110.



Mingyu Li received the B.S. degree in automation engineering from the Shandong University, Jinan, China, in 2012, and the M.S.E.C.E. degree in 2015 from the University of Massachusetts Amherst, Amherst, MA, USA, where he is currently working toward the Ph.D. degree in electrical and computer engineering. His research interests include post-CMOS nanoscale fabrics and VLSI design. He has published his research in several peer-reviewed IEEE/ACM journals and conferences, where he also contributes as a reviewer.



Jiajun Shi received the B.Eng. degree from the University of Electronic Science and Technology of China, Chengdu, China, in 2012, and the M.S. degree in computer engineering in 2014 from the University of Massachusetts Amherst, Amherst, MA, USA, where he is currently working toward the Ph.D. degree in computer engineering. He is currently a Research Assistant in the Nanoscale Computing Fabrics Laboratory, University of Massachusetts at Amherst. His research interests include nanoscale 3-D integration, beyond CMOS computer architectures, emerg-

ing devices, and nanoscale fabrication. His research has appeared in the IEEE/ACM International Conferences on Nanoscale Architectures 2012 and 2014. He is a reviewer of the IEEE TRANSACTIONS ON NANOTECHNOLOGY.



Mostafizur Rahman received the Ph.D. degree in electrical and computer engineering from the University of Massachusetts Amherst, Amherst, MA, USA. He was with the Department of Computer Science and Electrical Engineering (CSEE), University of Missouri Kansas City. He leads the Nanoscale Integrated Circuits (Nano-IC) Laboratory and is currently a Co-Lead for the Center for Interdisciplinary Nanoscale Research at CSEE. His group's research focuses on transformative approaches for nanoelectronics to surpass the current limitations of today's

integrated circuits. He is currently a Publication Chair for NANOARCH and a Guest Editor for special issue of the IEEE TRANSACTIONS ON NANOTECH-NOLOGY. He is also a Program Committee Member for NANOARCH and VL-SIDESIGN conferences. In addition, he is currently a reviewer for TNANO, JETC, JPDC, NANOARCH, and other publications.

Santosh Khasanvis received the B.Tech. degree in computer engineering from Vellore Institute of technology University, Vellore, India, in 2008, and the M.S. degree in computer engineering in 2012 from the University of Massachusetts Amherst, Amherst, MA, USA, where he is currently working toward the Ph.D. degree in computer engineering. He is currently a Research Assistant in Nanoscale Computing Fabrics Laboratory, University of Massachusetts Amherst. His research interests include unconventional magneto-electric computing with emerging nanotechnology, post-CMOS computing fabrics, machine learning, nano-VLSI, vertical 3-D integration, and emerging nanoscale memories. He has published his research in several peer-reviewed IEEE, ACM, and Elsevier journals and conferences, where he also contributes as a reviewer. He received best paper awards at the IEEE/ACM INTERNATIONAL CONFERENCE ON NANOSCALE ARCHITECTURES in 2013 and 2014.



Sachin Bhat received the B.E. degree in electronics and communication engineering from the Visvesvaraya Technological University, Belagavi, India, in 2014. He is currently working toward the M.S. degree in electrical and computer engineering at the University of Massachusetts Amherst, Amherst, MA, USA. His research interests include post-CMOS nanoscale fabrics for neuromorphic computing and VLSI design.



Csaba Andras Moritz received the Ph.D. degree in computer systems from the Royal Institute of Technology, Stockholm, Sweden, in 1998. From 1997 to 2000, he was a Research Scientist with the Laboratory for Computer Science, The Massachusetts Institute of Technology (MIT), Cambridge, MA, USA. He has consulted for several technology companies in Scandinavia and held industrial positions ranging from CEO to CTO and to founder. His most recent company, BlueRISC Inc., develops security microprocessors, hardware-assisted security, and system

assurance solutions for antitamper and cyber defense. He is currently a Professor in the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, USA. His research interests include nanoelectronics and nanoscale systems, computer architecture, and security.